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Abstract 

The minute-by-minute move of the Hang Seng Index (HSI) data over a four-year 
period is analysed and shown to possess similar statistical features as those of other 
markets. Based on a mathematical theorem [S. B. Pope and E. S. C. Ching, Phys. 
Fluids A 5, 1529 (1993)], we derive an analytic form for the probability distribution 
function (PDF) of index moves from fitted functional forms of certain conditional 
averages of the time series. Furthermore, following a recent work by Stolovitzky and 
Ching, we show that the observed PDF can be reproduced by a Langevin process 
with a move-dependent noise amplitude. The form of the Langevin equation can be 
determined directly from the market data. 



The availability of high-frequency economic time series, with a sampling rate 
of every few seconds, has generated a great deal of theoretical interest in the 
econometrics and the econophysics community [1-4]. Attempts have been made 
to devise models which produce time series with similar statistical character- 
istics as those of real markets. Many of these studies are based on variants of 
the Autogressive Conditional Heteroskedasticity (ARCH) process first intro- 
duced by Engle[5] to analyze the quarterly consumer price index in the UK 
over the period 1958 to 1977, and the generalised ARCH (GARCH) process 
which offers a more flexible description of the volatility memory effect (i.e., 
lag structure) [6] . The nonlinearity in the regression models makes it possible 
to generate probability distribution functions (PDF) with fat tails, a charac- 
teristic of financial data first noted by Mandelbrot [7]. However, since all these 
processes are discrete in time, an immediate question to ask is whether the 
quality of the modelling depends on the time unit chosen and if there is a time 
scale which is the most natural of all. Indeed, when the time step is not chosen 
properly, one has to either introduce many terms in the regression expression 
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Fig. 1. Normalised linear two-point correlation function of the minute- by-minute 
HSI moves over the four-year period 1994 to 1997. Note the weak oscillations of 
the correlation function, indicating a slightly under-damped behaviour. Also shown 
is the same correlation function calculated from the simulation using a Langevin 
equation. 

[the GARCH(p, q) model] to take into account memory effects, or to miss some 
of the important short-time statistics. 

An alternative approach, which partially circumvents the above difficulty, is 
to model the market price move as a continuous time process. Continuous 
time stochastic processes are quite familiar to physicists, ranging from simple 
Brownian motion to the fully-developed turbulence. In fact, the high-frequency 
market price movements have much in common with the velocity or tempera- 
ture time series in turbulent flows [8-11], an analogy we exploit in this paper. 
To put this statement on more quantitative terms, let us first summarise two 
salient statistical features which seem to be universally true for all major stock 
indices [4]. 

(i) Short linear correlation of price moves — For a given stock index S(t), one 
may define the price move over a fixed time interval 5 (say one minute), 



x(t) =S(t)-S(t-6). 



(1) 



It has been shown that the "linear correlation' 
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Fig. 2. The (unnormalised) probability distribution function (PDF) of the 
minute-by-minute HSI move (N = 201704 events). Data in the first twenty minutes 
of each morning trading session are discarded. Also shown are the PDF calculated 
from Eq. (6) by fitting the conditional averages (solid line), and the PDF of the 
time series from the simulation (crosses). 

decays to zero very rapidly, on the order of ten minutes. We have analysed 
the minute-by-minute Hang Seng Index (HSI) data collected over a four year 
period January 1994 — December, 1997. Figure 1 shows the linear correlation 
function C(r) with 5 = 1 min. It is seen that C(r) becomes nearly zero after 
a period of ten minutes or so. The decay is however not completely monotone, 
suggesting that the market is slightly under-damped. 

(ii) Nongaussian distribution of price moves with fat tails — The fat tails of the 
PDF P(x) of stock price moves are well-known and have also been observed 
for the movement of foreign currency exchange rates. Mandelbrot has observed 
that P(x) often decays as a power-law function of \x\, and hence, combining 
with (i), the stock index can be considered as a realisation of Levy walk[7]. 
From the analysis of the high-frequency S&P 500 data, Mantegna and Stanley 
showed that a truncated Levy distribution offers a better description of the 
PDF[2]. Figure 2 shows the PDF for the HSI minute-by-minute move data x(t) 
(open circles) on a semi-log scale, collected over the same period as in Fig. 1. It 
is seen that the decay at large \x\ can be well-described by a simple exponential 
function, as observed previously in Ref. [2]. For small \x\, a different behaviour 
is seen. A noticeable feature is the cusp-like singularity at x — 0, which is so 
far unexplained. 

The peculiar form of the PDF as seen in Fig. 2 has in fact been observed in 
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Fig. 3. (a) Conditional average r(x) (circles) and the linear fit (solid line), (b) 
Conditional average q(x) (circles) and a nearly linear fit (solid line). 

a physical context [10]. In analysing the temperature time series of thermal 
convection in the "hard-turbulence" regime, Pope and Ching[ll] considered 
the following conditional averages for a twice-differentiable time series x(t), 

r(x) = m, qix) = tm. (3) 



Here (-\x) denotes the average of a given quantity over those data points in 
the time series where x(t) = x. From the stationarity of the PDF, they proved 
that the PDF and the conditional averages r(x) and q(x) are related through 
the following equation, 



P(x) 



C 
q(x) 



cxp 



r(x') 
q(x') 



dx' 



(4) 



Using the turbulent temperature time series data as input, they showed that 
r(x) is generally linear in x with a negative coefficient. In the soft turbulence 
regime, q(x) is nearly constant. From Eq. (4), the resulting PDF is gaussian 
as observed. On the other hand, in the hard turbulence regime, q(x) increases 
with increasing giving rise to fat tails in the PDF. 

Figure 3 shows r(x) and q(x) computed using the HSI move time series. Indeed, 
the shape of these two functions are very much like the temperature data in 
Ref. [11], although there are small differences. The data can be fitted to the 
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following functional forms, 

r(x) = -Rx, q(x) = Q(x 2 + a 2 ) 1/2 , (5) 



where R = 0.036 and Q = 0.36. The round-off parameter a can not be deter- 
mined precisely because of the discreteness of the index data. Instead, we fix 
its value from the normalisation condition. Substituting Eq. (5) into Eq. (4), 
and carrying out the integration, we obtain, 



P(x) 



C 

(:r 2 + a 2 ) 1 /' 



cxp 



'-a(x 2 + a 2 f 2 



(6) 



where a — R/Q — 0.1 and C'~ l = 2K (aa), with K (u) being a modified 
Bessel function of the second kind. The solid line in Fig. 2 is produced by 
taking a 2 = 0.1. As can be seen, the agreement between the original data and 
the fitted form (6) is rather satisfactory. 

It is worth noting that the functional form (6) we propose is quite different 
from those of earlier studies [12]. The behaviour of the PDF around x = 
is controlled by the parameter a. A sharp peak is produced when a becomes 
very small. In this respect, a serves the purpose of a cut-off related to the 
discreteness of the underlying asset price. For large \x\, P(x) crosses over to 
simple exponential decay. There is however no scale invariance as previously 
suggested. 

The more challenging task is to devise a dynamic equation that generates a 
time series with the same conditional averages as those of the market data. 
This issue was considered recently by Stolovitzky and Ching[13]. They stud- 
ied a one-dimensional Langevin process defined by the following stochastic 
differential equation, 

™S + 1% = n*) + i2 1 k B T(x)f 2 at), (7) 



where is a gaussian white noise with (£(£)) = and (£(£)£(£')) = 5(t — t'). 
The main difference of (7) from the usual Brownian process is an rc-dependent 
temperature (or noise amplitude) T{x). In the over-damped limit 7 — > 00, 
they showed that the conditional averages are given by, 

(x\x) = F(x)/m, (x 2 \x) = k B T(x)/m. (8) 



Combining with Eq. (4), Stolovitzky and Ching showed that the PDF in this 
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limit is given by a generalised Boltzmann form, 



C 



X 



F{x') 
k B T{x') 



P(x) 



T(x) 



cxp 



:/ 



dx' . 



(9) 



o 



The significance of the above result is as follows. Assuming that a given time 
series is generated by a Langevin process (7), one can then determine the 
effective force F(x) and the effective temperature T(x) uniquely by computing 
the conditional averages from the data, apart from an overall time constant. 
The PDF of the Langevin time series is identical to the PDF of the original 
data by construction. 

The 7 — > oo limit is quite suitable for the analysis of the HSI data, as we have 
seen from the two-point correlation function C(t) that any memory effect 
about the direction of the move decays to zero rapidly. It is then suggestive to 
drop the inertia (i.e. mass) term in Eq. (7) altogether. Performing the scaling 
t — > yt and setting ks — 1, we can cast Eq. (7) in the form, 



The effective force and the effective temperature are related to the conditional 
averages through Eq. (8). The parameter m can be chosen to include a short- 
time relaxation effect as seen in Fig. 1[14]. 

We have simulated the Langevin equation (10) using an Euler integration 
scheme with At = 0.1 min. The form of the functions F(x) and T(x) are de- 
termined from the conditional averages (5). The PDF of the simulated minute- 
by-minute moves, with the same number of events as the original data, is 
shown in Fig. 2 (crosses). In Fig. 4 we plot both the HSI data (daily close) 
and the simulated index data with an artificial annual yield of 10%. The gross 
features of the two data sets seem to be similar to the naked eye, though in 
the simulated data set there is no daily and weekly breaks. 

One important feature which is missing in the Langevin equation (7) is the 
long-term volatility correlations which may last from a few days to several 
weeks or longer. The simulated time series has essentially the same relaxation 
time, on the order of a few minutes in our case, for the linear move correlation 
and for the volatility correlation. It is however possible to introduce volatility 
persistence by hand into the simulation. The effect of a nonstationary volatility 
on the PDF of the time series is a subject under current investigation. 
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dx 
~dt 



F(x) + [2T{x)YI 2 i{t)- 



(10) 
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Fig. 4. Daily close of the HSI over a four year period (heavy line) and the simulated 
index data (dotted line). 
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